1 Third Year Report

نویسنده

  • Yanbo J. Wang
چکیده

The aim of my PhD research is focused on Text Mining, one major research school in Knowledge Discovery in Databases (KDD), and in particular language-independent Documentbase Pre-processing (DPP) for classification / categorisation of documents, noted as Text Classification (TC), using novel algorithms for the identification of hidden patterns, rules, regularities and/or trends within these documents. Significant techniques in Data Mining, the classical research school in KDD that parallels to Text Mining, are involved to support this research, especially when dealing with a very large documentbase, such as Classification Rule Mining (CRM), Association Rule Mining (ARM), etc. One possible way to understand the framework of TC is to split it as DPP plus CRM. When applying Classification Association Rule Mining (CARM), a well established intersection between CRM and ARM, in TC, (1) the large volume of textual data (i.e., a given documentbase usually consists of more than 10,000 documents, where each document contains hundreds of words) can be handled, and (2) it is possible to deal with a small number of noisy data (i.e., a number of misspelling and/or cross-language words in documents). Based on (2), this approach can be identified as language-insensitive or language-independent in some angles, while acknowledging that such alternative approaches exist. With regards to the CARM based approach, developing a TC oriented Language-independent DPP (TC-LI-DPP) approach will consequently result an “almighty” TC approach, which have a general applicability regardless of the language(s) in which the documentbase to be classified are presented. In this report (a report that details what have been done within the third year of my PhD research), a number of language-independent DPP techniques, to support single-label Nclass TC, are described and compared. The discussion focuses on the vector space / “bag-of-*” model while acknowledging that alternative approaches to languageindependent DPP exist. A simple but effective statistical key / significant word identification approach is proposed which in turn is coupled with a number of phrase identification mechanisms. The emphasis in all cases is on language-independence so that the techniques described have a general applicability regardless of the language(s) in which the documentbase to be mined are presented. PDF created with pdfFactory Pro trial version www.software-partners.co.uk

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Unorthodox Change in the Angulation of an Impacted Mandibular Third Molar: A rare Case Report

Background: Third molars are the most frequently impacted teeth, and extensive research has been carried out delineating their impaction prevalence, classification, and treatment approaches. We present a rare case of an impacted mandibular third molar which went through unprecedented changes in angulation over an eight-year time span with no particular pathologic, traumatic, or therapeutic inte...

متن کامل

Autotransplantation of a mandibular third molar: A case report

Tooth autotransplantation defines as transition of one tooth from one position to another, in same individual. It is a biological procedure in which teeth have the potential to induce alveolar bone growth. It can be applied in patients before adolescence growth is finished. It significantly reduces time and cost compared to implants. Healing rapidly occurs and function is regained almost immedi...

متن کامل

Mandibular angle fracture following closed extraction of lower third molar: A case report and systematic review

Objectives Mandibular third molar extraction is among the most commonly performed dental procedures. Fracture of the angle of mandible after third molar extraction is a rare complication of this procedure. Case Herein, we report fracture of the right angle of mandible immediately after extraction of mandibular right third molar in a 38-year old healthy female patient, which was surgically mana...

متن کامل

Brooke-Spiegler Syndrome: a case report

Brooke-Spiegler syndrome is a rare autosomal recessive disease characterized by adnexal neoplasms, particularly trichoepithelioma, cylindroma, and occasionally spiradenoma, which usually develop in second to third decades of life. We report this syndrome in a 16-year-old woman with tumors on face and scalp.

متن کامل

Simultaneous of Mid Third Clavicle Fracture and Type 3 Acromioclavicular Joint Dislocation; A Case Report

  Simultaneous mid third clavicle fracture and acromioclavicular joint dislocation is a rare combination injury, as a result of high-energy trauma. We report a patient with a middle third clavicle fracture and ipsilateral grade three-acromioclavicular joint dislocation, which is a rare combination. The patient wanted to get back to work as soon as possible, so the fracture was fixed with recons...

متن کامل

Bilateral Dentigerous Cysts in a Non-Syndromic Patient: Literature Review and Report of a Case

Introduction: Dentigerous cysts (DCs) are the most common developmental cysts of the jaws, mostly associated with impacted third molars and canines. Multiple or bilateral DCs are rare and typically occur in association with some syndromes including cleidocranial dysplasia and Gorlin-Goltz. The occurrence of multiple DCs is rare in the absence of these syndromes. Case Presentation: A 28-year-ol...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006